Applications of graph theory to an English rhyming corpus

نویسنده

  • Morgan Sonderegger
چکیده

How much can we infer about the pronunciation of a language – past or present – by observing which words its speakers rhyme? This paper explores the connection between pronunciation and network structure in sets of rhymes. We consider the rhyme graphs corresponding to rhyming corpora, where nodes are words and edges are observed rhymes. We describe the graph G corresponding to a corpus of ∼12000 rhymes from English poetry written c. 1900, and find a close correspondence between graph structure and pronunciation: most connected components show community structure that reflects the distinction between full and half rhymes. We build classifiers for predicting which components correspond to full rhymes, using a set of spectral and non-spectral features. Feature selection gives a small number (1–5) of spectral features, with accuracy and F -measure of ∼90%, reflecting that positive components are essentially those without any good partition. We partition components of G via maximum modularity, giving a new graph, G′, in which the “quality” of components, by several measures, is much higher than in G. We discuss how rhyme graphs could be used for historical pronunciation reconstruction.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Rhyming Compounds as Elements of a Language Game (In Russian and English Languages)

The article is devoted to the study of composite rhyming compounds as a means of word formation games. It explores the place of this category of words in the lexical system and peculiarities of their use in the Russian and English languages. Authors of the article represent compound words as a special lexical subgroup. On the specific publicistic material are revealed the peculiarities of compo...

متن کامل

Concordance-Based Data-Driven Learning Activities and Learning English Phrasal Verbs in EFL Classrooms

In spite of the highly beneficial applications of corpus linguistics in language pedagogy, it has not found its way into mainstream EFL. The major reasons seem to be the teachers’ lack of training and the unavailability of resources, especially computers in language classes. Phrasal verbs have been shown to be a problematic area of learning English as a foreign language due to their semantic op...

متن کامل

Comparing k-means clusters on parallel Persian-English corpus

This paper compares clusters of aligned Persian and English texts obtained from k-means method. Text clustering has many applications in various fields of natural language processing. So far, much English documents clustering research has been accomplished. Now this question arises, are the results of them extendable to other languages? Since the goal of document clustering is grouping of docum...

متن کامل

Pronouncing "the" as "thee" to signal problems in speaking.

In spontaneous speaking, the is normally pronounced as thuh, with the reduced vowel schwa (rhyming with the first syllable of about). But it is sometimes pronounced as thiy, with a nonreduced vowel (rhyming with see). In a large corpus of spontaneous English conversation, speakers were found to use thiy to signal an immediate suspension of speech to deal with a problem in production. Fully 81% ...

متن کامل

Cultural Influence on the Expression of Cathartic Conceptualization in English and Spanish: A Corpus-Based Analysis

This paper investigates the conceptualization of emotional release from a cognitive linguistics perspective (Cognitive Metaphor Theory). The metaphor weeping is a means of liberating contained emotions is grounded in universal embodied cognition and is reflected in linguistic expressions in English and Spanish. Lexicalization patterns which encapsulate this conceptualization i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computer Speech & Language

دوره 25  شماره 

صفحات  -

تاریخ انتشار 2011